This is a brief summary of paper for me to study and arrange it, CHARAGRAM: Embedding Words and Sentences via Character n-grams (Wieting et al., EMNLP 2016) I read and studied.

This paper propose character embedding for word, sentence similarity and tagging problem.

character n-gram match exactly for its vector in character n-gram embedding model.

Also they said the character n-gram captures aspects of word order and word co-occurence in CHARACGRAM-PHRASE.

They use a white-space as a special token which they think it is regarded as signal of the beggining and end of a word in CHARAGRAM-PHRASE.

In other words, they model a character-based textual sequence by \(x\) = \(<x_1, x_2, … , x_m>\), which includes space characters between words as well as special start-of_sequence and end-of_sequence characters.

The CHARAGRAM model embeds a character sequence \(x\) by adding the vectors of it character n-grams followed by an elemenwise nonlinearity.

Note(Abstract): They present CHARAGRAM embeddings, a simple approach for learning character-based compositional models to embed textual sequences. A word or sentence is represented using a character n-gram count vector, followed by a single nonlinear transformation to yield a low-dimensional embedding. They use three tasks for evaluation: word similarity, sentence similarity, and part-of-speech tagging. They demonstrate that CHARAGRAM embeddings outperform more complex architectures based on character-level recurrent and convolutional neural networks, achieving new state-of-the-art performance on several similarity tasks.

Download URL:
The paper- CHARAGRAM: Embedding Words and Sentences via Character n-grams (Wieting et al., EMNLP 2016)

CHARAGRAM- Embedding Words and Sentences via Character n-grams

Title of paper - CHARAGRAM- Embedding Words and Sentences via Character n-grams

CHARAGRAM- Embedding Words and Sentences via Character n-grams

Title of paper - CHARAGRAM- Embedding Words and Sentences via Character n-grams

Reference